Goto

Collaborating Authors

 local valley


Onrankingviasortingbyestimatedexpectedutility

Neural Information Processing Systems

Since utilities can serveas target values to learn the scoring function through square loss regression, the optimality ofsorting byexpected utilities isequivalent tothe consistencyofregression.


On ranking via sorting by estimated expected utility

Neural Information Processing Systems

This paper addresses the question of which of these tasks are asymptotically solved by sorting by decreasing order of expected utility, for some suitable notion of utility, or, equivalently, when is square loss regression consistent for ranking via score-and-sort?


On ranking via sorting by estimated expected utility

Neural Information Processing Systems

This paper addresses the question of which of these tasks are asymptotically solved by sorting by decreasing order of expected utility, for some suitable notion of utility, or, equivalently, when is square loss regression consistent for ranking via score-and-sort?


Towards Sampling from Nondirected Probabilistic Graphical models using a D-Wave Quantum Annealer

arXiv.org Machine Learning

A D-Wave quantum annealer (QA) having a 2048 qubit lattice, with no missing qubits and couplings, allowed embedding of a complete graph of a Restricted Boltzmann Machine (RBM). A handwritten digit OptDigits data set having 8x7 pixels of visible units was used to train the RBM using a classical Contrastive Divergence. Embedding of the classically-trained RBM into the D-Wave lattice was used to demonstrate that the QA offers a high-efficiency alternative to the classical Markov Chain Monte Carlo (MCMC) for reconstructing missing labels of the test images as well as a generative model. At any training iteration, the D-Wave-based classification had classification error more than two times lower than MCMC. The main goal of this study was to investigate the quality of the sample from the RBM model distribution and its comparison to a classical MCMC sample. For the OptDigits dataset, the states in the D-Wave sample belonged to about two times more local valleys compared to the MCMC sample. All the lowest-energy (the highest joint probability) local minima in the MCMC sample were also found by the D-Wave. The D-Wave missed many of the higher-energy local valleys, while finding many "new" local valleys consistently missed by the MCMC. It was established that the "new" local valleys that the D-Wave finds are important for the model distribution in terms of the energy of the corresponding local minima, the width of the local valleys, and the height of the escape barrier.


On Connected Sublevel Sets in Deep Learning

arXiv.org Machine Learning

We study sublevel sets of the loss function in training deep neural networks. For linearly independent data, we prove that every sublevel set of the loss is connected and unbounded. We then apply this result to prove similar properties on the loss surface of deep over-parameterized neural nets with piecewise linear activation functions.